NHL goalies are kind of an enigma. They’re notoriously hard to evaluate and predict in a sport that is already so chaotic. They are also incredibly important to their teams and often you will hear hockey analysts citing teams’ goalies as the reason that the team is preforming well or poorly.
We are interested specifically in evaluating NHL goalies for the purpose of understanding their contracts. How do teams decide how much to pay their goalies? We want to inspect goalies in the post-lockout seasons and take a look at the best, the worst and how much they get paid.
The objective of this report is to evaluate NHL goalies and their contracts in the post-lockout era and then build a model to try and predict how much goalies will get paid on their next contract.
We wanted to specifically preform this analysis during the post-lockout era (2013-present). Hockey has changed so much in these seasons, comparing goalies from post-lockout era hockey to pre-lockout doesn’t quite make sense.
How many unique goalies played per season?
hr_data %>%
distinct(player, szn) %>%
count(szn, name = 'Number of goalies') %>%
rename(Season = szn) %>%
kable() %>%
kable_styling(bootstrap_options = c('hover'), full_width = F)
| Season | Number of goalies |
|---|---|
| 12-13 | 82 |
| 13-14 | 97 |
| 14-15 | 92 |
| 15-16 | 92 |
| 16-17 | 95 |
| 17-18 | 95 |
| 18-19 | 93 |
We see that ~95 goalies played each season. This may seem like a lot but this actually makes sense. From 2013-2017 there were 30 NHL teams in the league and from 2017-2019 there were 31. Each NHL team usually has a starting goalie, a back-up goalie, and a third-string “emergency” goalie. Teams calling-up goalies from their farm teams in emergency situations is also common. So you would (assuming that injuries are uniform) that number of teams in league * 3 is the approximate number of goalies who play each season.
In total there were 187 goalies that played at least 1 game in the NHL from 2013-2019. However, it makes no sense to evaluate a goalie based on their performance in just a few games because of the variation between events occuring in each hockey game. One of the many mysteries surrounding goalies is after how many games can we draw definitive conclusions about their quality of play? Let’s look at how many games goalies were playing per season and over their whole careers.
gp_szn <-
ggplot(hr_data, aes(gp)) +
geom_histogram(binwidth = 5, fill = '#a7d2cb', alpha = 0.8) +
labs(x = 'Games played per season by goalies', y = 'Frequency') +
theme(plot.caption = element_text(color = 'grey50'))
gp_overall <-
hr_data %>%
group_by(player) %>%
summarise(gp = sum(gp)) %>%
ggplot(aes(gp)) +
geom_histogram(binwidth = 25, fill = 'thistle', alpha = 0.8) +
labs(x = 'Games played overall by goalies', y = 'Frequency',
caption = '1 Game played = 1 or more minute played on ice') +
theme(plot.caption = element_text(color = 'grey50'))
gp_szn + gp_overall
In the case of this analysis, we were interested specifically in goalies who had played 40 or more games in the NHL from 2013-2019. Our reasoning behind this is that a full NHL season is 82 games so 40 games is about half a “season” played and seems like a reasonably large enough sample to evaluate a goalie’s play. The one downside to this evaluation is that it will devalue the contributions of back-up goalies, but back-up goalies usually don’t last as long as starters anyway and thus, their contracts tend to stay around league minimum.
Fact: In 2018-19, between goalies getting constantly blown out and injured, the Philadelphia Flyers were infamous for their goaltending trouble. They iced 8 goalies during the regular season, the highest of any team post-lockout.
read_csv('data/flyers_goaliedata_19.csv') %>%
clean_names() %>%
arrange(desc(gp)) %>%
select(Player = player, Team = tm, `Games played` = gp, `Save %` = sv_percent) %>%
kable() %>%
kable_styling(bootstrap_options = c('condensed', 'hover'), full_width = F)
| Player | Team | Games played | Save % |
|---|---|---|---|
| Carter Hart | PHI | 31 | 0.917 |
| Brian Elliott | PHI | 26 | 0.907 |
| Anthony Stolarz | PHI | 12 | 0.902 |
| Calvin Pickard | PHI | 11 | 0.863 |
| Michal Neuvirth | PHI | 7 | 0.859 |
| Cam Talbot | PHI | 4 | 0.881 |
| Alex Lyon | PHI | 2 | 0.806 |
| Mike McKenna | PHI | 1 | 0.833 |
It is well-known that most hockey players origniate from North America, Russia, or Scandinavia. Is this true for goalies specifically?
goalie_countries <-
cf_data %>%
count(country, name = "num_of_goalies") %>%
na.omit()
world <-
ne_countries(
scale = "medium",
returnclass = "sf",
continent = c('North America', 'Europe')
)
world %>%
left_join(goalie_countries, by = c('name' = 'country')) %>%
ggplot() +
geom_sf(aes(fill = num_of_goalies)) +
labs(fill = "Number of goalies") +
scale_fill_fish(option = "Prionace_glauca", direction = -1) +
theme_void() +
theme(legend.position = "bottom",
legend.key.height = unit(2, 'mm'),
legend.text = element_text(size = 7),
legend.title = element_text(size = 8))
We can see that this holds true for goalies for the most part, but it’s interesting to note that there are goalies who originate from Denmark and even the UK.
The old hockey idea is that goalies are supposed to be big. The taller and wider they are, the more space they take up in the net. However, with the evolution of hockey skill came the need for more athletic goalies with lightning reflexes. The butterfly style, now commonplace in the NHL, requires goalies to be flexible. So what do goalies look like in the NHL now?
cf_data_player_height_weight <-
cf_data %>%
select(player, weight, height, age, country) %>%
mutate(
weight = parse_number(sub(".*-","", weight)),
height = parse_number(sub(".*-","",height))
) %>%
mutate(age = case_when(
age < 20 ~ 'Under 20',
age >= 20 & age < 25 ~ '20-24',
age >= 25 & age < 30 ~ '25-29',
age >= 30 & age < 35 ~ '30-34',
age >= 35 & age < 40 ~ '35-39',
age > 40 ~ 'Over 40',
))
p <-
cf_data_player_height_weight %>%
ggplot(aes(height, weight)) +
geom_jitter(aes(color = age, label = player, label2 = country),
alpha = 0.65, size = 3) +
labs(x = 'Goalies\'s Height (cm)',
y = 'Goalies\'s Weight (kg)',
color = 'Age') +
scale_color_fish_d(option = 'Callanthias_australis')
ggplotly(p)
So we can see that the largest chunck of goalies seems to fall between 183 - 193cm (6’0 - 6’3 ft) in height and 85 - 95kg (187 - 210lbs) in weight. So only slightly taller and heavier than the average adult male, which is what you would expect from a professional athelete but is not quite what you would expect if you tend to think of goalies as big players.
hr_data <- hr_data %>% mutate(w_percent = w/gp)
Salaries <- cf_data %>%
group_by(player) %>%
summarise(mean_aav = mean(aav)) %>%
mutate(player = replace(player,player == "Marc-André Fleury","Marc-Andre Fleury")) %>%
mutate(player = replace(player,player == "Jaroslav Halák","Jaroslav Halak")) %>%
mutate(player = replace(player, player == "Eddie Läck","Eddie Lack")) %>%
mutate(player = replace(player, player == "Jacob Markström","Jacob Markstrom")) %>%
mutate(player = replace(player, player == "Petr Mrázek","Petr Mrazek"))
Best_Goalies <-
hr_data %>%
filter(!(player == "Martin Jones" & szn == "14-15")) %>%
group_by(player) %>%
summarise(gp = sum(gp),
w = sum(w),
ga = sum(ga),
sa = sum(sa),
sv = sum(sv),
) %>%
filter(gp > 120) %>%
mutate(mean_w_percent = w/gp,
mean_sv_percent = sv/sa,
mean_gaa = ga/gp,
avg_sv = mean(mean_sv_percent),
mean_gsaa = (sa * (1-avg_sv))- ga,
)
Best_Goalies <- Best_Goalies %>%
left_join(Salaries, by = c("player" = "player"))
Mean_sv_plot <-
Best_Goalies %>%
top_n(10, mean_sv_percent) %>%
ggplot(aes(x = mean_sv_percent, y = mean_aav)) +
geom_point(size = 2, color = '#a7d2cb') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Save Percentage", y = "Average Annual Salary")
Mean_gaa_plot <-
Best_Goalies %>%
top_n(-10, mean_gaa) %>%
ggplot(aes(x = mean_gaa, y = mean_aav)) +
geom_point(size = 2, color = '#f2d388') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Goals Against Average", y = "Average Annual Salary")
mean_w_percent_plot <-
Best_Goalies %>%
top_n(10, mean_w_percent) %>%
ggplot(aes(x = mean_w_percent, y = mean_aav)) +
geom_point(size = 2, color = '#c98474') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Percentage of Games Won", y = "Average Annual Salary")
mean_gsaa_plot <-
Best_Goalies %>%
top_n(10, mean_gsaa) %>%
ggplot(aes(x = mean_gsaa,y = mean_aav))+
geom_point(size = 2, color = '#874c62') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Goals Saved Above Average", y = "Average Annual Salary")
(Mean_sv_plot | Mean_gaa_plot) / (mean_w_percent_plot | mean_gsaa_plot)
# Finding the best goalies by Season
# creates tables for each stat we are looking at
top_goalie_sv <- function(season){
top_goalie_stats <- hr_data %>% filter(szn == season, gp > 40)
top_goalie_sv_percent <- top_goalie_stats %>% top_n(1,sv_percent) %>%
select(player, sv_percent, szn)
return(top_goalie_sv_percent)
}
top_goalie_gaa <- function(season){
top_goalie_stats <- hr_data %>%
filter(szn == season, gp > 40)
goalie_gaa <- top_goalie_stats %>%
top_n(-1,gaa) %>%
select(player, gaa, szn)
return(goalie_gaa)
}
top_goalie_w_percent <- function(season){
top_goalie_stats <- hr_data %>% filter(szn == season, gp > 40)
goalie_w_percent <- top_goalie_stats %>% top_n(1,w_percent) %>%
select(player, w_percent, szn)
return(goalie_w_percent)
}
top_goalie_gsaa <- function(season){
top_goalie_stats <- hr_data %>% filter(szn == season, gp > 40)
goalie_gsaa <- top_goalie_stats %>% top_n(1,gsaa) %>%
select(player, gsaa, szn)
return(goalie_gsaa)
}
#Loop to fill in the tables
best_sv_year <- tibble()
best_gaa_year <- tibble()
best_win_percent_year <- tibble()
best_gsaa_year <- tibble()
for (i in 1:7){
best_sv_year <- bind_rows(best_sv_year,top_goalie_sv(hr_szns[i]))
best_gaa_year <- bind_rows(best_gaa_year,top_goalie_gaa(hr_szns[i]))
best_win_percent_year <- bind_rows(best_win_percent_year,top_goalie_w_percent(hr_szns[i]))
best_gsaa_year <- bind_rows(best_gsaa_year,top_goalie_gsaa(hr_szns[i]))
}
#Plots to see best goalies by season
sv_plot <-
best_sv_year %>%
ggplot(aes(x = szn, y = sv_percent)) +
geom_point(size = 2, color = '#a7d2cb') +
geom_text_repel(aes(label = player), size = 3) +
coord_flip() +
labs(x = "Season", y = "Save Percentage")
gaa_plot <-
best_gaa_year %>%
ggplot(aes(x =szn, y = gaa)) +
geom_point(size = 2, color = '#f2d388') +
geom_text_repel(aes(label = player), size = 3) +
coord_flip() +
labs(x = "Season", y = "Goals Against Average")
w_percent_plot <-
best_win_percent_year %>%
ggplot(aes(x = szn, y = w_percent)) +
geom_point(size = 2, color = '#c98474') +
geom_text_repel(aes(label = player), size = 3) +
coord_flip() +
labs(x = "Season", y = "Win Percentage")
gsaa_plot <-
best_gsaa_year %>%
ggplot(aes(x = szn, y = gsaa)) +
geom_point(size = 2, color = '#874c62') +
geom_text_repel(aes(label = player), size = 3) +
coord_flip() +
labs(x = "Season", y = "Goals Saved Above Average")
year_plots <- (sv_plot | gaa_plot) / (w_percent_plot | gsaa_plot)
year_plots
# + plot_annotation(
# title = "Top Goalies by Season"
# )
#Worst Goalies by averages overall
worst_mean_sv_plot <- Best_Goalies %>%
top_n(-10,mean_sv_percent) %>%
ggplot(aes(x = mean_sv_percent, y = mean_aav)) +
geom_point(size = 2, color = '#a7d2cb') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Save Percentage", y = "Average Annual Salary")
worst_mean_gaa_plot <- Best_Goalies %>%
top_n(10,mean_gaa) %>%
ggplot(aes(x = mean_gaa, y = mean_aav)) +
geom_point(size = 2, color = '#f2d388') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Goals Against Average", y = "Average Annual Salary")
worst_mean_w_percent_plot <- Best_Goalies %>%
top_n(-10, mean_w_percent) %>%
ggplot(aes(x = mean_w_percent, y = mean_aav)) +
geom_point(size = 2, color = '#c98474') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Percentage of Games Won", y = "Average Annual Salary")
worst_mean_gsaa_plot <- Best_Goalies %>%
top_n(-10, mean_gsaa) %>%
ggplot(aes(x = mean_gsaa,y = mean_aav))+
geom_point(size = 2, color = '#874c62') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Goals Saved Above Average", y = "Average Annual Salary")
Worst_average_plots <- (worst_mean_sv_plot | worst_mean_gaa_plot) / (worst_mean_w_percent_plot | worst_mean_gsaa_plot)
Worst_average_plots
# Some really bad goalies by year
missing_salary <-
read_xls("data/MH_nhl_goalies_2017-2018.xls") %>%
clean_names() %>%
unite('player', c('first_name', 'last_name'), sep = ' ') %>%
filter(player == "Scott Darling") %>%
select(player, salary)
worst_goalies <- hr_data %>%
filter(gp > 40 & sv_percent < 0.9)
worst_salaries <- cf_data %>% select(player, aav, szn)
worst_goalies <- worst_goalies %>%
left_join(worst_salaries,by = c("player" = "player", "szn" = "szn")) %>%
left_join(missing_salary, by = c("player" = "player")) %>%
mutate(aav = replace_na(aav,0)) %>%
mutate(salary = replace_na(salary,0)) %>%
mutate(aav = aav+ salary) %>%
unite('player_szn', c('player','szn'), sep = ' | Season:', remove = FALSE) %>%
select(-salary) %>%
mutate(w_percent = w/gp)
wg_sv_percent <- worst_goalies %>%
ggplot(aes(x = sv_percent, y = aav)) +
geom_point(size = 2, color = '#a7d2cb') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Save Percentage", y = "Annual Average Salary")
wg_gsaa <- worst_goalies %>% ggplot(aes(x = gsaa, y = aav)) +
geom_point(size = 2, color = '#f2d388') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Goals Saved Above Average", y = "Annual Average Salary")
wg_gaa <- worst_goalies %>% ggplot(aes(x = gaa, y = aav)) +
geom_point(size = 2, color = '#c98474') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Goals Against Average", y = "Annual Average Salary")
wg_w_percent <- worst_goalies %>%
ggplot(aes(x = w_percent, y = aav)) +
geom_point(size = 2, color = '#874c62') +
geom_text_repel(aes(label = player), size = 3) +
labs(x = "Win Percentage", y = "Annual Average Salary")
Worst_goalies_plot <- (wg_sv_percent | wg_gaa) / (wg_w_percent | wg_gsaa)
Worst_goalies_plot
# + plot_annotation(
# title = "Worst perfoming goalies and how much they get paid"
# )
Red point is league average.
league_avgs <-
hr_data %>%
filter(gp > 20) %>%
group_by(szn) %>%
mutate(w_percent = w/gp) %>%
summarise(
avg_sv_percent = mean(sv_percent),
avg_gaa = mean(gaa),
avg_w_percent = mean(w_percent),
avg_gsaa = mean(gsaa)
)
starting_goalies_best <-
standings_data %>%
filter(rk == 1) %>%
select(-w, -l, -ol, -pts) %>%
left_join(read_csv('data/NHLTeams.csv')) %>%
left_join(hr_data, by = c('abbrev' = 'team', 'szn' = 'szn')) %>%
group_by(team, szn) %>%
top_n(1, gp) %>%
mutate(w_percent = w/gp)
starting_goalies_sv_percent <-
ggplot() +
geom_point(data = starting_goalies_best,
aes(x = szn, y = sv_percent, color = team),
show.legend = F, size = 4) +
geom_point(data = league_avgs,
aes(x = szn, y = avg_sv_percent),
show.legend = F, size = 3, color = 'red3') +
geom_text_repel(data = starting_goalies_best,
aes(x = szn, y = sv_percent, label = player),
size = 3) +
labs(y = 'Save percent', x = 'Season') +
ggtitle('Save percent') +
scale_color_fish_d(option = "Callanthias_australis") +
coord_flip() +
theme_hc()
starting_goalies_gaa <-
ggplot() +
geom_point(data = starting_goalies_best,
aes(x = szn, y = gaa, color = team),
show.legend = F, size = 4) +
geom_point(data = league_avgs,
aes(x = szn, y = avg_gaa),
show.legend = F, size = 3, color = 'red3') +
geom_text_repel(data = starting_goalies_best,
aes(x = szn, y = gaa, label = player),
size = 3) +
labs(y = 'Goals against avg', x = NULL) +
ggtitle('Goals against average') +
scale_color_fish_d(option = "Callanthias_australis") +
coord_flip()
starting_goalies_comb <- starting_goalies_sv_percent / starting_goalies_gaa
starting_goalies_comb